Revisiting the Spatial and Temporal Modeling for Few-Shot Action Recognition

نویسندگان

چکیده

Spatial and temporal modeling is one of the most core aspects few-shot action recognition. Most previous works mainly focus on long-term relation based high-level spatial representations, without considering crucial low-level features short-term relations. Actually, former feature could bring rich local semantic information, latter represent motion characteristics adjacent frames, respectively. In this paper, we propose SloshNet, a new framework that revisits for recognition in finer manner. First, to exploit features, design fusion architecture search module automatically best combination features. Next, inspired by recent transformer, introduce model global relations extracted appearance Meanwhile, another encode between frame representations. After that, final predictions can be obtained feeding embedded spatial-temporal common frame-level class prototype matcher. We extensively validate proposed SloshNet four datasets, including Something-Something V2, Kinetics, UCF101, HMDB51. It achieves favorable results against state-of-the-art methods all datasets.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generative Approach to Zero-Shot and Few-Shot Action Recognition

We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in...

متن کامل

Spatial-Temporal Trend Modeling for Ozone Concentration in Tehran City

 Fitting a suitable covariance function for the correlation structure of spatial-temporal data requires de-trending the data. In this article, some potential models for spatial-temporal trend are presented. Eventually the best model will be announced for de-trending tropospheric ozone concentration data for the city of Tehran (Capital city of Iran). By using the selected trend model, some ...

متن کامل

Representing Pairwise Spatial and Temporal Relations for Action Recognition

The popular bag-of-words paradigm for action recognition tasks is based on building histograms of quantized features, typically at the cost of discarding all information about relationships between them. However, although the beneficial nature of including these relationships seems obvious, in practice finding good representations for feature relationships in video is difficult. We propose a si...

متن کامل

Mining Spatial and Spatio-Temporal ROIs for Action Recognition

of the Thesis Mining Spatial and Spatio-Temporal ROIs for Action Recognition

متن کامل

Exploring Alternative Spatial and Temporal Dense Representations for Action Recognition

The automatic analysis of video sequences with individuals performing some actions is currently receiving much attention in the computer vision community. Among the different visual features chosen to tackle the problem of action recognition, local histogram within a region of interest is proven to be very effective. However, we study for the first time whether spatiograms, which are histograms...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i3.25403